Drug–drug interaction prediction: databases, web servers and computational models

Abstract In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug–drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.

In pharmacology, as a class of chemical substance with known structure, drugs can produce biological effect when they are administered to a living organism [1].More specifically, a pharmaceutical drug, also known as a medication or medicine, is a chemical substance used to prevent or treat diseases [1].Unlike food, for patients with different diseases, drugs may be taken into the bodies in different ways, such as inhalation, injection, ingestion, skin application, sublingual dissolution and so on [2].Besides, in clinical treatment, patients usually take drugs for a limited period of time or periodically over a long period of time [3].With the development of science and technology, the way of manufacturing drugs has changed a lot [4].Traditionally, drugs are derived from medicinal plants, but they have also been synthesized organically in recent years [5].As the number of drugs increase, a number of relevant databases have been constructed for further research [6,7].For example, the latest release of Drug-Bank [6] (version 5.1.10,released 4 January 2023) contains 15 451 drug entries including 2740 approved small molecule drugs, 1577 approved biologics, 134 nutraceuticals and over 6717 experimental drugs.

Drug-drug interaction
In clinical treatment, to cure or ameliorate symptoms of diseases or medical condition, two or more kinds of drugs are usually needed [8].The fundamental reason of drug combination lies in the evidence that combination therapy tends to have higher cure rates than monotherapy [9].For instance, the combination therapy of doxorubicin, cyclophosphamide, vincristine and prednisone is commonly implemented in cancer chemotherapy regimens [10,11].Another example is that the combined application of anti-tuberculosis drugs not only enhances the drug efficacy but also delays the emergence of drug resistance of Mycobacterium tuberculosis, which is the main cause of tuberculosis [12].However, the risk of harmful drug-drug interactions (DDIs) increases as patients take more kinds of drugs [13].For example, more than one-third of older Americans regularly use five or more drugs or supplements, and 15% are at risk for serious DDIs [14].Specifically, DDIs refer to the interactions between drugs used in drug therapy or between drugs and metabolites, endogenous substances, food and diagnostic agents [15].DDIs can either enhance the efficacy (synergistic action) or decrease the efficacy (antagonistic action) by resulting in changes in the nature, intensity, duration, side effects and toxicity of drugs [16].In other words, the consequences of DDIs include the desired, the insignificant (vast majority) and the harmful reaction [17].In general, we are more concerned about harmful DDIs in terms of drug safety.To improve drug safety, it is necessary to fully understand the pharmacological action of each drug before combination therapy so as to achieve the best curative effect and the least adverse drug reactions [18].
Drugs in combination prescriptions would inf luence each other's effects and change the way they work in the human body [19].This kind of inf luence can be divided into pharmaceutical interactions, pharmacodynamics (PD) interactions and pharmacokinetics (PK) interactions [19].Pharmaceutical interactions refer to the change of drug action caused by chemical reaction due to unreasonable dispensing, that is, interactions in vitro before drugs enter the body [20].An example of pharmaceutical interaction is that tetracycline and calcium salt injection may result in precipitate due to the formation of chelate under neutral or alkaline conditions [21].PD interaction means that two drugs share the same receptor, and one drug has antagonistic, additive, synergistic or indirect pharmacological effects on the other drug [13].For instance, because both atropine and tubocurarine reversibly bind to receptors, the combination therapy of these two drugs blocks the action of the normal physiological transmitter acetylcholine [22].PK interactions refer to the interference caused by simultaneous or sequential use of two or more drugs in the metabolic stage (absorption, distribution, metabolism and elimination), resulting in enhanced efficacy or adverse reactions [23].Besides, different from PD interactions, PK interactions often lead to changes in the blood concentration of the interacting drugs [23].For example, the combination therapy of warfarin and nonsteroidal anti-inf lammatory drugs would result in a PK interaction [24].In detail, some nonsteroidal anti-inf lammatory drugs can inhibit warfarin metabolism, thereby enhancing the effect of warfarin on hypoprothrombinemia and significantly increasing the risk of bleeding [24].For those patients who are receiving combination therapy, unidentified DDIs may reduce efficacy, cause unexpected side effects or other adverse drug reactions and even endanger life [25].The harmful reactions from DDIs include bleeding, bone marrow suppression, arrhythmias, hypotension, rhabdomyolysis, central nervous system depression, seizures, hypoglycemia, renal failure and so on [26].
Hence, paying attention to DDIs is of great significance to adopt effective combination therapy and further improve the quality of medical treatment.Moreover, in-depth understanding of the absorption, distribution, metabolism and excretion process of drugs in the body, as well as the interactions between various drugs in the body, can reduce adverse drug reactions and ensure drug safety [27].Nevertheless, as a new drug is approved, the potential DDIs resulting from different drug combinations will grow exponentially.In this case, it is not hard to imagine that the process of validating potential DDIs one by one through biological experiments is very expensive and time-consuming [28].With the rapid development of computing technology, large-scale DDIs analysis and prediction have also made great strides.In the face of the huge number of existing available drugs, computational models can be used to screen drug combinations with high probability of interaction [16].
As increasing evidence show that the discovery of potential DDIs plays an important role in drug development and disease treatment, more and more researchers devote themselves to corresponding studies.In the following parts, we first reviewed the existing databases and web servers about DDIs.Then, we introduced three types of computational models and discussed their advantages and disadvantages.Furthermore, we put forward the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.

Databases and web servers
With the rapid development of drug-related research studies, more and more drug-related databases and web servers have been constructed to facilitate researchers to carry out deeper research studies.Next, we brief ly introduced some representative databases as well as web servers and summarized them in Table 1.

DrugBank (https://www.drugbank.ca/)
The latest release of DrugBank (version 5.1.10,released on 4 January 2023) contains 15 451 drugs including 2740 approved small molecules, 1577 approved biologics (proteins, peptides, vaccines and allergenics), 134 nutraceuticals and over 6717 experimental drugs [6].Each drug in DrugBank contains more than 200 data fields with half of the information being devoted to introducing drugs from the aspects of chemistry, pharmacology as well as pharmacy, and the other half being devoted to recording the sequence, structure and pathway of the drug target.In addition, DrugBank records more than 1.3 million DDIs and provides DDIs Checker, through which users can check the interactions between up to five drugs at one time.

DDInter (http://ddinter.scbdd.com)
As a comprehensive database dedicated to DDI research, the DDInter not only records 236 834 DDIs involving 1833 drugs but also documents the detailed information about each DDI, such as mechanisms, risk levels, recommendations for drug adjustment and so on [29].Besides, similar to DrugBank, DDInter also provides Interaction Checker for users to check whether drugs interact with each other.

SuperDRUG2 (http://cheminfo.charite.de/superdrug2)
The SuperDRUG2 database is intended to serve as a comprehensive knowledge base of approved and marketed 4587 drugs (involving small molecule, biological products and other drugs) [30].The annotation of drugs contains regulatory details, chemical structures (2D and 3D), dosage, biological targets, physicochemical properties, external identifiers, side-effects, pharmacokinetic data and DDIs.Besides, SuperDRUG2 can be used to infer potential DDIs and further provide alternative recommendations for elderly patients.OncoRx is an oncology database that documents 943 DDIs between 117 anticancer drugs (ACDs) and 166 complementary and alternative medicines (CAMs) [32].What needs to be pointed out is that OncoRx primarily covers PK and PD DDIs, as these two kinds of interactions explain most of the clinically relevant interactions between drugs.Besides, when users applied OncoRx to search DDIs, some important information would also be provided, such as DDIs parameters, pharmacokinetic data on ACDs and CAMs as well as characteristics of CAMs based on traditional Chinese medicines principles.

DIDB (https://www.druginteractionsolutions.org/)
The Drug Interaction Database (DIDB) is designed to support the decision-making process of scientists in evaluating PK DDIs and drug safety, which is composed of human in vitro and in vivo datasets [33].The human in vitro datasets contain results from both metabolism and transporter studies, while the human in vivo datasets include the studies result about organ impairment, pharmacogenetics and drug interaction, where the drug interaction results are derived from drug-drug, drug-food and drug-herb interaction studies.

DrugComb (https://drugcomb.fimm.fi/)
As an open-access data portal, DrugComb documents the standardized results of drug combination screening studies about 739 964 combinations involving 8397 drugs [7].In addition, Drug-Comb provides a web server, through which users can analyze and visualize their own drug combination screening data.
DailyMed (https://dailymed.nlm.nih.gov/dailymed/aboutdailymed.cfm) The DailyMed database contains labels for two types of drugs, namely, FDA-approved drugs (such as prescription drug, nonprescription drug, certain medical devices, etc.) and additional drugs regulated but not approved by the FDA (such as dietary supplements and unapproved prescription as well as nonprescription, etc.).It should be pointed out that the labels of prescription drug and biological products contain a summary of the essential scientific information for the safe and effective use of the product, such as indications, dosage, administration, adverse reactions, DDIs and so on.

Web servers
Except for databases, there are also some online web servers that can be used to analyze or predict DDIs, such as PolySearch2 [34], DDI-CPI [35] and vNN-ADMET [36].

PolySearch2 (http://polysearch.ca)
PolySearch2, an online text-mining system, can provide relationships between biomedical entities, such as human diseases, genes, single-nucleotide polymorphisms (SNPs), proteins, drugs, metabolites and so on [34].Specifically, for one given entity that support retrieval, PolySearch2 will return all types of aforementioned entities associated with this entity.In the search results, each type of entity is sorted in reverse order based on the Z-score calculated by PolySearch [37].Synonyms of each entity as well as key sentences mined from literatures to confirm the corresponding association Label propagation-based model [65] Implementing label propagation based on multiple similarity information Collective PSL-based model [71] Applying the hinge-loss MRFs to identify potential DDIs in the multigraph through maximum a posteriori Random forest-based model [73] Introducing the enrichment score of the targets of drugs Logistic regression-based model [81] Implementing prediction based on two interaction networks constructed based on the information about PK and PD interactions are also provided.It should be pointed out that, to improve the accuracy and coverage, the retrieval results presented to users are mined from well-known free-text collections (e.g.MEDLINE, PubMed and Wikipedia) and biological databases (e.g.UniProt and DrugBank).

DDI-CPI (http://cpi.bio-x.cn/ddi/)
Considering that a large amount of DDIs are mediated by drugprotein interactions, the DDI-CPI server [35] is constructed to predict potential DDIs based on the chemical-protein interactome (CPI), which is a methodology that mimics the theoretical interactions between drug and proteins using silicon simulations [38].For a given drug, DDI-CPI will present the predicted probabilities of interactions between the drug and 2515 drugs in the library of DDI-CPI.

vNN-ADMET (https://vnnadmet.bhsai.org/vnnadmet/ login.xhtml)
Through the vNN-ADMET webserver [36], users can obtain the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of drugs by using one of the fifteen models constructed based on the variable nearest neighbor (vNN) method [39].For example, these models could be applied to predict some important properties of the given drug, such as cytotoxicity, mutagenicity, cardiotoxicity, DDIs, microsomal stability and druginduced liver injury.

Computational models
As mentioned above, detecting DDIs is beneficial to clinical drug combination treatment.However, due to the high cost and long cycle of experimental methods, it is of great significance to develop effective computational models to infer potential DDIs on a large scale.During recent years, to predict unknown DDIs, researchers have built a number of computational models, which could be divided into three categories: traditional machine learning-based models, deep learning-based models and score function-based models.

Traditional machine learning-based models
Traditional machine learning algorithms have been widely used to solve complex problems in industrial application [40,41] and biological science [42][43][44][45][46][47][48][49][50][51][52].Here, traditional machine learningbased prediction models mainly covers label propagation, Markov random fields (MRFs), random forest, logistic regression, support vector machine (SVM), matrix factorization, ensemble learning and so on.Traditional machine learning-based models could be used to predict DDIs on a large scale and are suitable for new drugs.However, there are still some limitations to be resolved.For example, in the models constructed based on supervised learning algorithms, unlabeled samples are treated as negative samples because of the lack of highly reliable negative samples.Besides, for the parameters involved in the traditional machine learning-based models, researchers often randomly set the values of the parameters rather than using some algorithms to obtain the optimal values of the parameters, which limits the performance of models to some extent.Moreover, researchers tended to obtain the feature vectors of drug pairs by splice the feature vectors of corresponding drugs directly, so constructing more significant feature vectors is still an urgent problem to be solved.

Bayesian probabilistic method-based model
Based on the hypothesis that the smaller the minimum distance between the targets of two drugs in the PPI network constructed based on the Human Protein Reference Database [53], the greater the possibility of PD interaction between the corresponding drugs, Huang et al. [13] designed a model for PD interaction prediction by considering drug actions in the PPI network (Figure 1).Firstly, for protein p i , the authors constructed its coding gene's expression profile across 79 human tissues [54], denoted by EP i (79D vector).Secondly, for the protein pair (p i ,p j ) with known interaction, the authors calculated the Pearson correlation coefficient between EP i and EP j to weight the edge connecting p i and p j in the PPI network.
Besides, for each drug, the authors constructed a target-centered system consisting of the target proteins of the corresponding drug and the first-step neighboring proteins of the target protein in the PPI network.Finally, for drugs d i and d j , the system connection score S-score ij was calculated to describe the tightness of connection between the target-centered systems of d i and d j : where x ij and s ij represent the mean and SD of edge weights connecting the proteins in the target-centered system of d i and d j , respectively; n ij denotes the number of edges connecting two target-centered systems; and μ 0 refers to the mean of all edge weights in the PPI network.It should be noted that if two targetcentered systems had a common protein, an artificial edge with a weight of 1 was added between the two systems.In addition, following the previous research [55], the drug phenotypic similarity score P-score ij between d i and d j was calculated based on the clinical side effects of drugs.Finally, inspired by the Bayesian probabilistic model proposed by Xia et al. [56], the likelihood ratio (LR) for drug pair (d i ,d j ) to be true-positive DDIs versus truenegative DDIs based on S-score ij and P-score ij were calculated: LR P − score ij = P P − score ij |positive Then, by multiplying the LRs calculated based on two independent evidences (i.e.S-score ij and P-score ij ), the interaction probability P ij between d i and d j was obtained:

INDI
Instead of predicting a single type of DDIs, the model of INferring Drug Interactions (INDI) proposed by Gottlieb et al. [57] could be used to infer PD, PK and potential PK interactions (the drug pair was metabolized by the same cytochrome P450 enzyme but there was no interaction evidence between these two drugs).Firstly, the authors collected these three types of DDIs from DrugBank [6] and Drugs.com(http://drugs.com).Then, they calculated drug similarity from seven aspects, including the chemical structure [58], receptors [59], side effects [60] and anatomical therapeutic chemical (ATC) codes [61] of drugs, the sequence of drug targets [62], the distances between drug targets on the human PPI network [63] and the semantic similarity between drug targets [61], which were denoted by S i (i = 1,2, . . ..6, 7), respectively.Further, they integrated the above multiple drug-drug similarities to obtain the features of drug pairs.Specifically, given a drug pair (d 1 ,d 2 ) without where D p represents the set consisting of drug pairs with known interactions; i,j = 1,2 . . .

Label propagation-based model
By integrating the information about side effects and chemical structure of drugs, Zhang et al. [65] proposed a model to predict potential DDIs based on the label propagation algorithm.Firstly, the authors downloaded the information about the label and off-label side effect of drugs from SIDER [66] and OFFSIDES [67], respectively.Besides, the chemical structure information was extracted from PubChem [68].Secondly, for each drug, three binary feature vectors were constructed according to the information about label side effect, off-label side effect and chemical structure, respectively.Then, the Jaccard index was used to calculate the similarities between drugs and three corresponding similarity matrices were constructed later.Thirdly, the authors used the Bregmanian Bi-Stochastication (BBS) algorithm [69] to normalize the three similarity matrices and the normalized matrixes were denoted as W k (k = 1-3).Besides, the adjacency matrix A was constructed based on the known DDIs, whose row was defined as the label of the corresponding drug.As for label propagation, in the tth iteration, the drug nodes absorbed the label information of the neighbor nodes at a ratio of μ and retained the original label information with a ratio of (1 − μ) to update the label.Therefore, based on the kth similarity matrix W k , the interaction probability matrix P t k could be obtained as follows: where P 0 k = A. The matrix P t k obtained after iteration convergence is the final interaction probability matrix, which could also be obtained by minimizing the following objective function: where tr () refers to the trace of a matrix; F refers to the Frobenius norm.To introduce multiple similarity information about drugs, the authors calculated the converged solution by solving the following composite optimization problem: where δ refers to the regularization parameter; α = [α 1 , α 2 , α 3 ] represents the vector composed of weight coefficients; 2 represents the Euclidean norm.Finally, the block coordinate descent (BCD) [70] schema was applied to calculated P and α, where the element of matrix P was regarded as the interaction probability between the corresponding two drugs.

Collective probabilistic soft logic-based model
Through the collective probabilistic soft logic (PSL) framework, Sridhar et al. [71] inferred potential DDIs based on multiple drug similarities and known DDIs.Specifically, the authors calculated seven kinds of drug similarity, including chemical structure-based [58], ligand-based [59], side effect-based [60], drug annotationbased [61], target protein sequence-based [62], PPI network-based [63] and Gene Ontology (GO)-based [61] drug similarities.Besides, by representing drugs with nodes, the authors constructed a multigraph with eight types of edges (denoting the above seven types of drug similarity and known DDIs, respectively).Then, first order logic-syntax was used by PSL to template for a special class of MRFs models called hinge-loss MRFs (HL-MRFs).Finally, HL-MRFs was applied to identify potential DDIs in the multigraph through maximum a posteriori (MAP) [72] based on the rule that if drug d i and d j were similar, and there was known DDI between d j and d k , d i was likely to interact with d k .

Random forest-based model
Liu et al. [73] proposed a random forest-based method to predict unknown DDIs.Before training the random forest model, each drug pair was represented by a feature vector derived from three aspects: chemical interaction between drugs [74], protein interactions between the targets of drugs and enrichment score of the targets of drugs.Taking the drug pair (d i ,d j ) as an example, the feature based on the chemical interaction referred to the 'Combined_score' of the drug pair recorded in STITCH [75].According to the protein interaction score documented in STRING [76], the features of the drug pairs were defined based on different target protein sets and the same target protein set, respectively.Specifically, the author took the maximum DS m ij and average value DS a ij of the interaction scores between the target proteins of d i and the target proteins of d j as the features built based on a different target protein set.As for the features based on the same target protein set, for these proteins in the target protein set of drug d i , the authors calculated the maximum SS m i and average value SS a i of the interaction scores between these proteins.In the same way, the maximum SS m j and average value SS a j of the interaction scores between the target proteins of the drug d j were obtained.Then, four features of the drug pair (d i ,d j ) were defined (i.e. ).The authors also constructed the feature vectors of drug pairs based on the enrichment score of target proteins in 229 pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG) [77].Specifically, for drugs d i and d j , the authors calculated their target enrichment scores in these pathways, respectively, denoted by es ).Minimum redundancy maximum relevance [78] as well as incremental feature selection [78] were used to implement feature extraction, and the final 386 features were selected for the drug pair based on the value of the Matthews's correlation coefficient [79].Finally, the random forest algorithm with its default configuration (in Weka 3.6.4[80]) was adopted to train the prediction model.

Logistic regression-based model
Based on the assumption that a query drug (Dq) tends to interact with a drug to be examined (De) if Dq is structurally similar to drugs in the interaction network of De, Takeda et al. [81] proposed a DDI prediction model highly relying on the 2D structural similarities between Dq and all drugs in the interaction network of De.Firstly, for each De, the authors constructed two interaction networks based on the PK and PD information about it.The nodes in the network constructed based on PK information represented enzymes as well as transfer proteins associated with De and drugs associated with the above-mentioned enzymes as well as transfer proteins.It should be pointed out that the drugs in the network could be divided into three categories, that is, drugs that interact with enzymes, drugs that have pharmacogenetic associations with enzymes and drugs that are transported by transfer proteins.While the nodes in the network constructed based on PD information refer to the target protein of De, the drugs associated with the target protein, other proteins that interact with the target protein and the drugs associated with these proteins.Similarly, the drugs in the network are also divided into three categories, namely, drugs that target the target protein, drugs that have pharmacogenetic association with the target protein and drugs having pharmacogenetic association with the protein interacted with target proteins.Secondly, according to the PubChem 2D fingerprint [82] and the Tanimoto coefficient, the structural similarities between Dq and all the drugs (including De) in the two networks of De were computed.Then, they calculated the maximum similarity between Dq and each type of drugs, respectively, which constituted the feature vector (seven-dimensions) of drug pair (Dq,De) together with the similarity between Dq and De.After constructing the balanced classification dataset [the number of (Dq,De) pairs with known interactions is the same as the number of pairs without interactions], the authors trained the logistic regression model by using the generalized linear models (glm) implemented in R package caret based on the feature vectors of drug pairs [83].Finally, they applied the trained logistic regression model to predict the potential DDIs.

Positive-unlabeled learning-based model
To deal with the problem of rarely available negative samples, Hameed et al. [84] proposed a positive-unlabeled learning (PUL) method to infer potential DDIs.Initially, based on the chemical structure [6], indication [85], target protein [6] and side effect [86] information of drugs, four binary vectors f k k = 1, 2, 3, 4 were constructed as the feature vectors of the corresponding drug, respectively.Then, based on these feature vectors, two types of feature vectors of drug pairs were defined.Specifically, the first type was Jaccard index-based similarity feature representation 1 (SFR1), where the drug similarity matrix S k k = 1, 2, 3, 4 was calculated based on the kth feature vector using the Jaccard index.Therefore, according to SFR1, the author could obtain the 4D feature vector F ij 1 of drug pair (d i , d j ): Besides, the authors raised the similarity feature representation 2 (SFR2) to capture the shared properties of drugs.Specifically, the author averaged the corresponding element values of the kth feature vectors of d i and d j , respectively, to obtain the vector Then, the authors defined the feature vector F ij 2 constructed based on SFR2: (11) Secondly, the authors considered 6036 drug pairs with interactions recorded in DrugBank as positive samples, and another 6036 samples were randomly selected from unlabeled samples as candidate samples.Then, they utilized the growing self-organizing maps (GSOM) clustering algorithm [87] to cluster the above samples based on SFR1.If a cluster contained only candidate samples, these candidate samples were regarded as negative samples.In a similar way, negative samples were inferred based on SFR2, and the 589 common negative samples inferred based on SFR1 and SFR2 were considered as the final negative samples.Thirdly, they randomly selected 589 samples from the positive samples as the final positive samples, which were combined with the inferred negative samples to form the training set.The authors repeated the sampling 10 times to construct 10 balanced training sets.Then, the authors applied these training sets to train SVM classifiers based on SFR1 and SFR2, respectively.Finally, the prediction results of the trained 20 classifiers were averaged to obtain the final prediction result.

Meta-learning-based model
Similar to the above PUL-based approach, considering that it is difficult to obtain reliable negative samples, Deepika et al. [88] proposed a semi-supervised learning framework (Figure 2) for predicting DDIs through combining representation learning [89], PUL [90] and meta-learning [91].Specifically, the authors first applied the same four types of drug features as the above PUL-based approach to construct corresponding feature network, respectively.Taking the feature network constructed based on the chemical structure as an example, there were two kinds of nodes in the network: drug nodes and chemical substructure nodes.If the drug has a certain substructure, the corresponding two nodes are connected by an edge; otherwise, they are not connected, and the other three feature networks were constructed in a similar way.For each type of feature, drugs in the feature network were represented with a d-dimensional feature vector via node2vec [92], which was a representation learning algorithm that was applied to explore the neighborhood information of nodes in the network through biased random walk and then obtain the features of the nodes.For the drug pair (d 1 ,d 2 ), a d-dimensional vector composed of the absolute value of the difference between the corresponding element in the feature vector of d 1 and d 2 was defined as the feature vector of the drug pair.Next, the authors trained the base classifiers based on each type of feature vector and calculated the weight of each base classifier by cross validation, respectively.Then, the multiplication of the interaction probability predicted by the base classifier and the weight of the classifier was defined as the score of corresponding drug pair.Therefore, based on the four types of features, four scores could be obtained.Then, the final feature vectors (four-dimension) of drug pairs were constructed based on their scores to train the metaclassifier to predict drug pairs with potential interaction from unlabeled samples.It should be pointed out that the bagging SVM classifier constructed based on PUL method was used as the base classifier and meta-classifier of this model.

Manifold regularized matrix factorization
Zhang et al. [93] presented a novel computational method named manifold regularized matrix factorization (MRMF) to predict potential DDIs by introducing the drug feature-based manifold regularization into the matrix factorization.The authors defined the adjacency matrix A based on the known DDIs, where if the drug d i and d j have known interaction, then A ij = 1, otherwise A ij = 0.Then, eight feature vectors of drugs were constructed based on the information about the substructures, targets, enzymes, transporters, pathways, indications, side effects and off side effects of drugs, respectively.Further, the Jaccard similarity matrix (S Jar ), cosine similarity matrix (S Cos ) and Gauss similarity matrix (S Gau ) of drugs were calculated based on the feature vectors, respectively.For matrix factorization, in order to approximate the matrix A, two low-rank matrices X and Y could be obtained by minimizing the following objective function: where A − XY T 2 F represents the least square cost function, which is used to ensure that the final product of X and Y approximates the matrix A; X 2 F + Y 2 F is used to overcome the overfitting problem; x i refers to the ith row of the matrix X and y j is the jth row of the matrix Y; λ represents the Tikhonov regularization parameter.Given the wide applications of manifold learning [94,95], the authors treated the similarity between drugs as manifolds and assumed that drugs approximately maintained manifolds in the low-dimensional space.Then, the manifold regularizations for drugs in the low-dimensional space were defined as follows: where S ij is the similarity between drug d i and d j .By introducing the manifold regularization, the new objective function was defined as follows: where μ is the manifold regularization parameter.The alternating decent method was applied to minimize the above objective function to obtain the latent feature matrices X and Y.Then, the interaction probability matrix P can be calculated as follows: where the element P ij indicates the interaction probability between drug d i and d j .

DDINMF
Yu et al. [96] proposed a semi-nonnegative matrix factorization method to predict enhancive and degressive DDIs (DDINMF), where enhancive (degressive) DDI refers to a drug increases (decreases) the serum concentration of itself and another drug when taken together.The authors considered m drugs with known interactions with other drugs as known drugs and n drugs without verified interactions with all known drugs as new drugs.In the training phase, different from the definition of adjacency matrix in MRMF, after extracting the enhancive and degressive DDIs information about m known drugs, the adjacency matrix A was constructed as follows: 1, if there is an enhancive drug interaction between d i and d j − 1, if there is an degressive drug interaction between d i and d j 0, otherwise The authors constructed the p-dimensional feature vectors of drugs based on their chemical structure and side effects.Furthermore, the feature vectors of known drugs were combined into feature matrix F.Then, two nonnegative low-rank matrices (W and H) used to approximate matrix A could be obtained by minimizing the following objective function: After obtaining matrices W and H by the method proposed by Lee et al. [97], to make the model suitable for new drugs, the authors introduced feature matrix into nonnegative matrix factorization.Specifically, they modeled the relationship between the feature matrix F and H as follows: where B represents the regression coefficient matrix, which was calculated by SIMPLS algorithm [98].In the predicting phase, the feature matrix F composed of the features of n new drugs was mapped into the latent topological space as follows: Then, the interaction probabilities between the known drugs and new drugs were calculated as follows:

Triple matrix factorization-based unified framework
Similar to DDINMF, Shi et al. [99] presented a triple matrix factorization-based unified framework (TMFUF) to infer both enhancive and degressive DDIs.For m known drugs, the adjacency matrix A was constructed in the same way as in DDINMF, but in TMFUF, the feature vectors were constructed only based on the side effect information of drugs.Then, the author modeled the relationship between matrix A and feature matrix F as a bilinear regression, which could be represented as the triple matrix factorization: where Θ refers to the symmetrical projection matrix, whose role is to link the features of drugs with the interactions between drugs.
To obtain the matrix Θ, the author first calculated matrix A * d as follows: where A d denotes the latent interaction matrix, whose row refers to the feature of corresponding drug in the latent space, and singular value decomposition was used to obtained A * d .Then, matrix B * was calculated: where B represents the regression coefficient matrix.SIMPLS [98] was applied to solve the optimization problem to obtain B * , and the matrix Θ was obtained as follows: Then, the interaction possibility matrixes could be calculated: where F n represents the feature matrix involving only new drugs; the elements of P n,n represent the interaction probabilities between new drugs, while the elements of P n,m refer to the interaction probabilities between new drugs and known drugs.That is, TMFUF could be used to predict not only known but also new drugs that interact with new drugs.

Local classification model via Dempster-Shafer theory of evidence
Under the assumption that similar drugs tend to interact with the same drug, Shi et al. [100] also proposed an integrated local classification model via Dempster-Shafer theory of evidence (LCM-DS) to predict potential DDIs.The authors first constructed the drug similarity matrix through directly averaging three different drug similarity matrices (i.e. chemical structuresbased, side effect-based and off-label side effect-based drug similarity matrix) derived from the work of Zhang et al. [101].
Then, based on the known DDIs as well as drug similarity, three local classification-based models (LCMs) constructed according to SVM [102], regularized least squares (RLS) [103] and multi-label K-nearest neighbors (MLKNNs) [104], were applied to calculate the interaction probabilities between drugs, respectively.Finally, the authors proposed a novel fusion method based on the Dempster-Shafer theory of evidence [105] to integrate the results of the three LCMs to obtain the final interaction probabilities between drugs.

DDIGIP
Based on the Gaussian Interaction Profile (GIP) kernel and RLS classifier, Yan et al. [106] proposed a model of DDIGIP to predict potential DDIs.Specifically, the authors first calculated eight types of drug feature vectors in the same way as MRMF, which were spliced to form the final feature vectors of drugs.Then, they used the Pearson correlation coefficient to calculate the similarities between drugs based on their respective feature vectors and then constructed the drug similarity matrix S P .Later, to make DDIGIP also applicable to new drugs, the authors used the Knearest neighbors (KNNs) to calculate the initial relational score between new drug d i and known drug d j to fill in the adjacency matrix A: where K i set represents the set of top K drugs with the largest similarity to drug d i .Next, they calculated the drug GIP similarity matrix S G via the filled adjacency matrix.Finally, the authors used the RLS classifier to compute the predicted interaction probability matrix P as follows: where I refers to the identity matrix and σ represents the regularization parameters.

Gradient boosting-based model
Qian et al. [107] proposed an extreme gradient boosting (XGBoost) classifier for the prediction of DDIs by integrating multiple features of drug pairs (Figure 3).Different from the way of constructing the feature vectors of drugs directly based on the side effect information in the previous models, the authors downloaded the data on side effects from SIDER [66], where the Unified Medical Language System (UMLS) concept IDs [108] were used as the side effects identifiers.Then, according to the dictionary MedDRA [109], they mapped the UMLS concept IDs to MedDRA concept IDs at four different levels, including preferred term (PT), highlevel term (HLT), high-level group term (HLGP) and system organ class (SOC).At each level, a binary vector was constructed for each drug as its feature vector.Furthermore, the Jaccard index was used to calculate the similarities between two drugs at the four levels, respectively, which constituted the side effect-based features of corresponding drug pairs.The information about drug indications was also collected from SIDER [66] and mapped to the same four levels.Then, the indication-based features of drug pairs were obtained in a similar way.Moreover, the authors calculated the sequence similarities between the target proteins of two drugs by Smith-Waterman algorithm [110] and then used the minimum, mean, median and maximum of the similarities between target proteins to construct the target sequencebased features of corresponding drug pairs.Similarly, the interaction scores between genes were downloaded from the study of Costanzo et al. [111], and the minimum, mean, median as well as maximum of interaction scores between the target protein gene of two drugs were used to construct genetic interaction-based features of the corresponding drug pairs.Therefore, 16 features of each drug pair could be obtained by integrating the above four types of features.Then, a feature selection method known as group minimax concave penalty (MCP) [112] was applied to obtain 11 features with significantly different value distributions between interacting drug pairs and noninteracting drug pairs, which formed the final feature vector for each drug pair.Finally, the authors applied the XGBoost classifier to calculate the interaction probability between corresponding drugs.In addition, to obtain better prediction performance, the authors optimized the hyperparameters of the classifier using the tree-structured Parzen estimator (TPE) approach [113].

Network algorithm and matrix perturbation algorithm-based model
By means of multisource data fusion, Zhang et al. [114] presented a f lexible framework to integrate multiple models for DDI identification.Firstly, the authors applied the Jaccard index to calculate eight types of drug similarities based on the information about substructure, target, enzyme, transporter, pathway, indication, side effect and off side effect of drugs, respectively.Besides, according to the DDIs network constructed based on known DDIs, they computed six other kinds of drug similarities, namely, common neighbor similarity, Adamic-Adar similarity, resource allocation similarity, Katz similarity, average commute time similarity and random walk with restart similarity.Furthermore, the authors adopted two similarity-based models (constructed based on the neighbor recommender algorithm [115] as well as the random walk algorithm [116]) and one model only based on known DDIs (built according to the matrix perturbation algorithm [117]) to predict potential DDIs.Thus, according to the above-mentioned 14 types of drug similarities and known DDIs, 29 prediction models, namely, 28 similarity-based models and one model, based only on known DDIs were constructed.Finally, the weighted average ensemble rule and the classifier ensemble rule were implemented to fuse the prediction results, respectively.Specifically, the weighted average ensemble rule took the weighted average of the outputs of all prediction models, while the classifier ensemble rule took the logistic regression to map the outputs of all models to a score as the final prediction results.

Heterogeneous network-assisted inference
Cheng et al. [118] proposed a heterogeneous network-assisted inference (HNAI) framework to identify potential DDIs.The authors firstly extracted 6 946 known DDIs from DrugBank [6] as positive samples, which formed the training set with the same number of drug pairs randomly selected from drug pairs without verified interaction.Then, the authors calculated four types of drug similarities: phenotypic similarity, therapeutic similarity, chemical structural similarity and genomic similarity.Specifically, the phenotypic similarity, therapeutic similarity and chemical structural similarity were calculated according to the method proposed in the authors' previous work [119], respectively.As for genomic similarity, the authors firstly constructed a binary vector for each drug based on the target protein information, and then the Tanimoto coefficient [120] of the two binary vectors was regarded as the genomic similarity between the corresponding two drugs.The four types of similarity between the two drugs constituted the feature vector of the corresponding drug pair.Finally, based on the drug pairs in the training set and corresponding feature vectors, the authors trained five prediction models (namely, naive Bayes, decision tree, KNN, logistic regression and SVM) to identify potential DDIs, respectively.Moreover, the authors constructed the interaction network based on the known DDIs and carried out statistical analysis by combining the above four types of similarity.It turned out that the more similar two drugs are, the higher probability of interaction between them.

Integrated action crossing
Hunta et al. [121] proposed an integrated action crossing (IAC) method to predict the potential DDIs by focusing on the drugenzyme and drug-transporter actions.Different from the above models where the feature vectors of drug pairs were constructed based on drug similarity, the authors proposed a method called action crossing (AC) to obtain the feature vectors of drug pairs according to the information about drug-enzyme actions [including substrate (S), inhibitor (Inh) as well as inducer (Inc)].Specifically, for drug d i and enzyme e k , the action attribute vector X ik was defined as follows: where x ik S ,x ik Inh and x ik Inc represent whether the drug d i has a corresponding action on the enzyme e k .Taking x ik S for an example, if drug d i is a substrate of enzyme e k , the value of x ik S is 1, otherwise 0. x ik Inh and x ik Inc could be obtained in a similar way.Then, the feature vector of drug pair (d i ,d j ) based on enzyme e k was constructed: where if the pth (p = 1-3) element of vectors X ik and X jk were both 1, f ijk p has a value of 1, otherwise 0. Similarly, they constructed transport-based feature vectors of drug pairs based on the information about drug-transporter actions.The authors collected 36 enzymes as well as 35 transporters and sorted them based on their ID.Then, they calculated the feature vector based on enzyme and transporter in turn and spliced them to obtain the final feature vector.Finally, the final feature vectors of drug pairs were used to train three models (SVM, KNN and neural networks), respectively, which were used to predict potential DDIs.

SFLLN
Zhang et al. [122] proposed the sparse feature learning ensemble method with linear neighborhood regularization, named SFLLN, to predict potential DDIs.Firstly, the authors constructed the corresponding binary feature vectors for each drug based on the information about substructure, target, enzyme and pathway, respectively.Then, the same type of feature vectors for all drugs were combined into the feature matrix F i (i = 1-4).Secondly, the authors projected drugs from different feature spaces to the common interaction space by approximating the interaction probability matrix P with the product of the feature matrix F i and the nonnegative projection matrix G i .Besides, they controlled the sparsity of projection matrixes through minimizing improve the generalization ability of the model, where n i referred to the number of columns in matrix G i .Therefore, they defined the objective function as follows: min where A refers to the adjacency matrix constructed based on known DDIs; λ and μ refer to free parameters.Besides, by assuming that the predicted DDIs have the same structure as the known DDIs, the authors defined the Lagrangian function as follows to extract the data structure from known DDIs: where ⊗ denotes the Hadamard product; C is an indicator matrix, where C[i,j] = 0 if i = j, otherwise C[i,j] = 1; e is a n d -dimensional vector with all elements being 1; n d is the total number of drugs; Φ refers to the Lagrange multiplier; matrix W ref lects the intrinsic structure of known DDIs.Then, W was obtained by taking the derivative of L with respect to W and setting the derivative to 0.
In order to ensure that the drugs retain their internal structure after projection, matrixes P and W should meet the following requirement: Therefore, by algebraic transformation, the authors defined the linear neighborhood regularization [123]: By combining Formulas ( 33) and ( 36), the authors defined the final objective function: The authors set the partial derivative of L to P as 0 to obtain the relationship between P and G: Then, Formula (37) could be rewritten as follows: (39) where all elements in matrix E k ∈ R nk×nk are equal to 1 and n k represents the number of columns in F k .Finally, the authors solved Formula (39) by semi-nonnegative matrix factorization algorithm to get G k and then obtained the interaction probability matrix P based on Formula (38).

Deep learning-based models
Given the successful application of deep learning algorithms in the fields such as natural language processing (NLP) [124,125] and pattern recognition [126,127] in recent years, more and more researchers have built models based on deep learning methods to predict potential DDIs.Deep learning-based models can not only automatically extract the features of drugs but also effectively integrate the features of drugs by multiple modules to obtain the features of the corresponding drug pairs.In addition, NLPbased models could be applied to mine a large number of DDIs from the literatures.However, as with the models constructed based on the traditional machine learning algorithms, the scarcity of reliable negative samples severely limits the performance of deep learning-based models.Besides, because the goal of training is to obtain the optimal values of the parameters and the most significant features of the drug pairs, using deep learning-based models usually takes more time to make predictions.We have a brief introduction to some of them below.

DDIMDL
Deng et al. [128] developed a multimodal deep learning framework to identify unknown DDIs (DDIMDL).Firstly, the authors constructed the corresponding binary feature vector for each drug based on the information about substructure, target, enzyme and pathway, respectively.Secondly, the Jaccard index was used to calculate the similarities between drugs and four corresponding similarity matrices (i.e. S s , S t , S e and S p ) were constructed later, where the ith row of each similarity matrix was regarded as the corresponding feature vector of drug d i .Thirdly, for each similarity matrix, its ith and jth rows were fed into the DNN to calculate the interaction probability between d i and d j .Finally, the authors took the average of the interaction probabilities obtained based on the four similarity matrices as the final result.

Substructure-substructure interaction for drug-drug interaction
Considering that DDIs were caused by the interactions between the substructures of the corresponding two drugs, Shi et al. [129] developed a deep learning-based model named substructuresubstructure interaction for drug-drug interaction (SSI-DDI) to predict potential DDIs (Figure 4).Firstly, for drug d i , based on the software RDKit (https://www.rdkit.org/), the SMILES string of d i download from DrugBank [6] was converted into a molecular graph, where nodes represented the atoms contained in d i and edges referred to the bonds between corresponding atoms.Secondly, they built a module made up of four graph attention (GAT) layers in series (layer 1→layer 2→ layer 3→layer 4), where layer 1 was used to extract the feature vector (64-dimension) of each node based on the molecular graph, while the other three layers were applied to update the vector output from the previous layer.Then, by integrating the feature vectors of all atom nodes in the molecular graph of d i , the vector recording the substructure information extracted by the kth layer could be obtained: where n is the total number of nodes; parameter β p indicates the importance of the pth node, which could be obtained by SAGPooling [130]; v k,p i refers to the feature vector of the pth node extracted by the kth layer.For the substructure of d i extracted by the k 1 th layer and substructure of d j extracted by the k 2 th layer, the coattention mechanism was applied to calculate the importance score r k1k2 of the interaction between these two substructures to the final DDI prediction.Finally, the interaction probability between d i and d j could be obtained by integrating the interactions between substructures of two drugs: where σ refers to the sigmoid function; M is a learnable matrix, which was obtained by training SSI-DDI on 1024 known DDIs using the Adam optimizer [131].

Substructure-aware tensor neural network model for DDI prediction
Shi et al. [132] also constructed another model named substructureaware tensor neural network model for DDI prediction (STNN-DDI), which could be used to predict the types of DDI.Firstly, each drug was represented by an n-dimensional binary vector, where n represents the total number of substructures under study.For example, drug d i could be denoted by the vector e i = e i 1 , e i 2 , ...e i n , where the value of e i j is 1 if d i contains the jth substructure and 0 otherwise.Then, for drugs d i and d j , the authors used P d ijk to represent the probability of the kth type of interaction between d i and d j , which could be defined as follows: P s pqk e i p e j q (42) where P s pqk refers to the probability that there is kth type of interaction between the pth substructure and the qth substructure.In order to get the interaction probabilities between substructures, the authors constructed a tensor named ST, where both axes x and y represented the substructures, while the axis z referred to the DDI types, and set ST pqk = P s pqk .Therefore, Formula (42) could be rewritten as ST pqk e i p e j q (43) Based on the CP decomposition [133], the tensor ST could be approximated by three factor matrices A, B and C: where λ denotes the r-dimensional vector used to normalize the columns of three factor matrices; A ∈ R n×r ,B ∈ R n×r and C ∈ R f ×r record the latent information of the x, y and z axes, respectively; r is the number of rank-one tensors decomposed by ST; f represents the total number of DDI types.Besides, by assuming that the rows of matrix A and B represent the embedding of the corresponding substructure, while the rows of C refer to the embedding of the corresponding DDI type, the authors introduced the multi-linear tensor transformation [134] to define the interaction probability P d * ijk of the kth type of interaction between d i and d j : where × n refers to the mode-n product; v k represents the onehot vector encoding the kth type of interaction; parameter bias is added to enhance the robustness of STNN-DDI.Then, they defined the loss function F: where D train is the training set consisting of all positive samples and an equal number of randomly selected negative samples; the value of PD ijk is 1 if there is the kth type of interaction between drug d i and d j and 0 otherwise.Finally, the authors constructed a fully connected neural network model based on Formulas ( 45) and ( 46) to obtain matrices A, B, C, λ and bias according to the training set.Then, the interaction probability between drugs could be calculated based on Formula (45).

META-DDIE
Deng et al. [135] proposed a few-shot computational model named META-DDIE to predict the types of DDIs, which consisted of a representation module and a comparing module.In the representation module, the authors first constructed a binary vector S i for drug d i based on its structure information.For drug pair d i -d j , its feature vector F i,j was defined as follows: Then, the authors employed a neural network to encode the vector F i,j to a embedding vector E 1 i,j and applied another neural network to decode a new feature vector F i,j from vector E 1 i,j .To train the framework consisting of encoder and decoder, the authors defined the loss function: where n is the dimension of the vector F i,j .Secondly, based on the SMILES of drugs, the authors applied a chemical sequential pattern mining (SPM) algorithm [136] to obtain a set of discrete frequent substructures of drugs in the database.The kth frequent substructure (denoted by a single-hot vector v k ) was fed into the above neural network for encoding to obtain corresponding embedding vector E 2 k .Then, the vector E 1 i,j could be projected on a subspace defined by span ( E 2  1 , E 2 2 , ..., E 2 ns ): where r k i,j (k = 1,2, . . .,n s ) represents the projection coefficient, which could be calculated via the method proposed by Huang et al. [137]; n s denotes the total number of frequent substructures.The vector r i,j = r 1 i,j , r 2 i,j , ..., r ns i,j was regarded as the final representation the drug pair d i -d j .For the few-shot learning, the authors divided drug pairs into training sets and test sets, and both of them were further divided into support set as well as query set.For drug pairs DP p in the support set and DP q in the query set, their representation were fed into the comparing module constructed as in the study [138] to obtain an n t -dimensional similarity vector S p,q between the two drug pairs, where n t refers to the number of DDI types.Then, they defined the loss function based on mean square error to train the model: where n p and n q represent the number of drug pairs in the support set and the query set, respectively.If DP p and DP q have the same type of DDIs, the value of l p,q is 1, otherwise 0.Then, the model was trained by minimizing the loss function L. Finally, the authors applied the trained model to calculate the similarity vector S x,y between drug pairs DP x in the support set and DP y in the query set.The DDIs type corresponding to the maximum value in the vector S x,y was regarded as the type of drug pair DP y.

Deep attention neural network-based drug-drug interaction prediction model
Liu et al. [139] developed a deep attention neural network-based drug-drug interaction prediction model (DANN-DDI) to identify potential DDIs.Specifically, the authors first constructed five networks, including the drug-substructure network, drug-target network, drug-enzyme network, drug-pathway network and DDI network.For drug d i , the authors applied structural deep network embedding method [140,141] to learn its embeddings (i.e.E s i ,E t i ,E e i ,E p i andE d i ) from the above networks, respectively.Then, the authors constructed the comprehensive vector of d i based on the above embeddings.For drug d i and d j , their comprehensive vectors were used as the input of the attention neural network [142] to obtain the feature vector of drug pair d i -d j .Finally, the feature vector was fed into the framework consisting of the input layer, multiple fully connected hidden layers and the output layer, and the softmax function was applied to calculate the interaction probability between d i and d j based on the output of the framework.

Multi-relational contrastive learning graph neural network
Xiong et al. [143] proposed a model named multi-relational contrastive learning graph neural network (MRCGNN) to predict the types of DDIs.Firstly, by taking drugs as nodes and known DDIs as edges, the authors constructed a multi-relational DDI event graph G = (V,E,T), where V and E represent the set of all drug nodes and edges, respectively, T denotes the set of all DDIs types.Secondly, after obtaining the molecular graph of each drug in the same way as used in SSI-DDI, the authors utilized TrimNet [144] to extract the features of drugs based on the corresponding molecular graph and constructed feature matrix F by combining the features of all drugs.Then, the authors employed the relational graph convolutional network (R-GCN) encoder [145] to learn the original representation vectors of drugs from the graph G with the features of drugs as node attributes, and the representations of all drugs formed the matrix H. Besides, the global representation g was defined: where refers to the readout function.Thirdly, in order to implement the multi-relational contrastive learning on G, the authors corrupted the graph G by shuff ling the features of drug nodes and edges to obtain corrupted graphs G v = Ṽ, E, T and G e = V, Ẽ, T , which were fed into the R-GCN encoder to obtain the corresponding drug representation matrix H v and H e , respectively.Given that the training goal of contrastive learning was to maximize the consistency between H and g, as well as the difference between H v /H e and g, the authors defined two loss functions: ) where T Wg; W represents a trainable parameter matrix; n d refers to the total number of drugs.For drug pair d i -d j , the authors spliced the corresponding features and representations (i.e.

F[i,], F[j,], H[i,] and H[j,]
) together to obtain the final representations r i,j of the drug pair, which was fed into the multilayer perceptron (MLP) followed by a Softmax function to implement multi-class prediction: where P i,j is a n t -dimensional vector; n t is the number of DDIs types; P i,j [k] represents the probability of the kth type of interaction between drug d i and d j .Then, the authors defined another loss function: where Ω represents the training set; L k i,j is the true label of drug pair d i -d j , if there is the kth type of interaction between d i and d j , L k i,j has the value 1, otherwise 0. To train MRCGNN, the authors defined the final loss function based on the above three loss functions: where α and β refer to the hyperparameters used to balance different loss functions.

Multichannel feature fusion model for multi-typed DDI prediction
Chen et al. [146] developed a multichannel feature fusion model for multi-typed DDI prediction (MCFF-MTDDI), which consisted of three modules, namely, feature extraction module, feature fusion module and classifier module (Figure 5).The authors firstly removed all <DRUGBANK::ddi-interactorin::Compound::Compound> edges to remove the DDI information from the drug repurposing knowledge graph (DRKG) [147].The remaining triples made up the biomedical knowledge graph (KG) dataset after removing the isolated drug nodes.In the feature extraction module, the authors extracted three types of KG representations (namely, initial embedding representation, subgraph mean representation and subgraph frequency representation) for each drug through corresponding methods based on the KG dataset, respectively.Besides, they also obtained the Morgan fingerprint vector [148] of each drug through RDKit (https:// www.rdkit.org/)based on the SMILES string of corresponding drug.Then, the Morgan fingerprint vectors of the two drugs were spliced together to construct the chemical structure-based feature of the corresponding drug pair.Moreover, the authors constructed the extra label-based feature of each drug pair for multi-label prediction.Specifically, after removing the drugs without SMILES, there were 12 362 different DDI types in the dataset, corresponding to 12 362 labels.Then, from the DDI labels involving more than 10 000 drug pairs, the authors selected 200 labels involving the fewest drug pairs to build the target label set, and the remaining 12 162 labels made up the extra label set.For the drug pair (d i ,d j ), a 12 162D binary vector was constructed as its extra label vector, where if there was kth type of interaction between d i and d j , the value of h k ij is 1, otherwise 0.Then, principal component analysis (PCA) was applied to reduce the dimension of extra label vector to obtain vector H EL(d i ,d j ) ∈ R 300 , and the extra label-based feature vector was defined: where W represents the trainable weight and b refers to the bias.
In the feature fusion module, the state encoder consisting of two fully connected layers and two state vector strategy blocks was used to integrate the KG representations to obtain the KG fusion representations of the drug pair (d i ,d j ).Moreover, the chemical structure-based feature and KG fusion representations were input into a GRU-based multichannel feature fusion framework to obtain the fused feature vector F FU(d i ,d j ) of the drug pair (d i ,d j ).
In the classifier module, the vector F FU(d i ,d j ) was used as the input of the multi-class classifier to implement the multi-class classification tasks, while the extra label-based feature vector F EL(d i ,d j ) and vector F Fu(d i ,d j ) were concatenated and input into the multi-label classifier to implement the multi-label classification.
It should be pointed out that both classifiers consisted of two fully connected layers, where the number of neurons in the last layer was equal to the number of DDI types of the corresponding classification task.

DSIL-DDI
Tang et al. [149] proposed a model called DSIL-DDI to predict potential DDIs by implementing causal representation learning [150] on the substructures of drugs.Specifically, for drug d i , the authors first constructed the molecular graph for d i in the same way as used in SSI-DDI, which was input into graph neural networks (GNNs) to obtain its substructure representations, where the pth substructure was represented by vector S p i .After obtaining the representations of two substructures, the priori representation of the interaction between them was defined: where W SSI represents the learnable weight matrix.Then, the attention weights were used to modify the priori representation to obtain the posteriori representation: × ssi pq (59) where MLP denotes the multilayer perceptron and concat refers to the concatenate operation.In a similar way, the authors calculated the posteriori representations between all substructures of d i and all substructures of d j , which were integrated to obtain the substructure interaction matrix, where the row and column represented the substructure of two drugs, respectively, and the element referred to the posteriori representation of corresponding two substructures.Finally, the substructure interaction matrix was input into the single-layer linear network to obtain the interaction probability between d i and d j .

DSN-DDI
Based on the intra-view and inter-view representation learning methods, Li et al. [151] developed a novel model named DSN-DDI to identify potential DDIs.Firstly, the authors obtained the molecule graph of each drug in the same way as used in SSI-DDI.Besides, for the drug pair (d i ,d j ), they built a bipartite graph by connecting each atom node in the molecule graph of d i with all nodes in the molecule graph of d j in turn.Secondly, to learn the representations of nodes in the graphs, the authors constructed four identical DSN encoders (composed of the representation extraction layer, the intra-view layer and the inter-view layer), which formed the DNS encoder module in series.Specifically, in the first DSN encoder, the molecular graph of drug d i was input into the representation extraction layer to obtain the representation of each node.Then, in the intra-view layer, the GAT [152] was applied to update the node representation by capturing the interactions between the atoms of d i .Moreover, the node representations of drug d j were extracted and updated in a similar way.Besides, the bipartite graph was fed into the representation extraction layer to extract the representation of each node.Then, in the inter-view layer, the node representation of two drugs were updated by capturing the interactions between the atoms of two drugs via the co-attentional mechanism.The other three DNS encoders were applied to update the output from the previous encoder.Then, the output of the DNS encoder module was input into the self-attention graph (SAG) pooling layer to learn the drug representations and obtain the embedding vectors of d i and d j .
Finally, the co-attention scoring function was used to calculate the interaction probability of corresponding drug pair based on the embedding vectors.

BioDKG-DDI
Based on the self-attention mechanism [153], Ren et al. [154] constructed a model of BioDKG-DDI to predict potential DDIs (Figure 6).Firstly, based on the molecular structure information of drugs recorded in DrugBank [6], a novel molecular representation method named Mol2Context-vec [155] was used to extract the molecular structure features of drugs.Secondly, according to four types of association information (namely, drug-carrier association, drug-enzyme association, drug-target association and drugtransporter association) recorded in DrugBank, they constructed the drug knowledge graph (DKG), where nodes represented biological entities and edges referred to corresponding associations.Then, ComplEx-DURA [156] was applied to extract the global association features of drugs based on the DKG.Thirdly, according to the association information of drug-carrier, the author constructed an adjacency matrix, where rows and columns represent drugs and carriers, respectively.The Euclidean distance between ith row and jth row of the adjacency matrix was calculated as the similarity between drug d i and d j .Then, the similarity matrix based on the association information of drug-carrier was obtained.Besides, the corresponding drug similarity matrices were obtained based on the other three types of associations in a similar way.Then, the similarity network fusion method [157] was used to integrate four similarity matrices to get the final similarity matrix, where each row was regarded as the similarity features of the corresponding drug.Finally, the author used the self-attention mechanism to integrate the above three types of features of each drug to get the final feature and input the final features of the two drugs into the deep neural networks (DNNs) to obtain the interaction probability of the corresponding drug pair.

MDF-SA-DDI
According to multisource feature fusion and self-attention mechanism [158], Lin et al. [159] developed a model named MDF-SA-DDI to predict DDIs (Figure 7).Firstly, for each drug, three binary vectors were constructed based on the information about targets, enzymes and chemical structures of drugs, respectively.Then, based on each type of binary vector, the author calculated the similarity between drugs by Jaccard index and constructed the corresponding similarity matrix.The ith row of the three similarity matrices were spliced as the feature vector of drug d i .Secondly, the authors used the Siamese network [160], convolutional neural networks (CNNs) and autoencoders with self-attention mechanism to fuse the feature vectors of two drugs to obtain the feature vectors of drug pair (d i ,d j ), respectively.Finally, the multi-head selfattention mechanism was applied to integrate the above three types of feature vectors of drug pair (d i ,d j ) to obtain the final feature vector, which was input into the full connection layer to calculate the interaction probability between d i and d j .

Deep feed-forward network-based model
Lee et al. [161] proposed a novel DDI prediction model based on autoencoders and the deep feed-forward network.Firstly, the authors calculated three types of drug similarity.Taking drug d i and d j as an example, a binary vector was constructed based on the substructure information of each drug, and then the Tanimoto coefficient of the two vectors was calculated as the substructurebased similarity between d i and d j .Besides, based on the target genes of drugs, the authors calculated the target gene-based similarity according to the functional interaction (FI) network downloaded from BioGrid [162]: where G i and G j represent the set of target genes of drug d i and d j , respectively, (x,y) refers to the gene pair composed of gene x and y, d(x,y) is the distance between x and y in the FI network.In a similar way, the GO term-based drug similarity could be calculated according to the GO term and GO graph [163].Then, the corresponding similarity matrixes were constructed based on the three types of similarity, respectively.Secondly, for the drug pair (d i ,d j ), the ith and jth rows of each similarity matrix were input into the autoencoder to obtain the feature vector of (d i ,d j ).Finally, the three feature vectors of the drug pair were spliced to obtain the final feature vector, which was input into the deep feedforward network to obtain the interaction probability between d i and d j .

R 2 -DDI
Lin et al. [164] developed a model relation-aware feature refinement for DDI prediction (R 2 -DDI) to predict potential DDIs.Specifically, for drug d i , the authors first constructed its molecular  graph in the same way as used in SSI-DDI, which was input into DeeperGCN to obtain the graph features vector E i .The kth type of interaction was represented by a learnable vector T k ∈ R d , where d represents the dimensions of the interaction feature.Then, in order to construct the relationship among E i , E j and T k , the authors calculated the refinement vectors based on the MLP, which were added to the corresponding original feature vectors to obtain the refined features: Finally, the probability of the kth types of interaction between d i and d j was calculated as follows:

Graph kernel-based approach
In view of the successful application of NLP in biomedicine [165] and computational biology [166], Zhang et al. [167] proposed a novel model to detect rapidly accumulating PK DDIs from the biomedical literatures.Unlike the above models of implementing predictions based on the database recording known DDIs, the PK DDI corpus built by Wu et al. [168] were employed in this study, which recorded 428 abstracts derived from literatures on PK DDIs.The drug pairs consisting of two drugs appearing in the same sentence were considered as candidate samples.Besides, each sentence with candidate samples was represented by a dependency graph (constructed based on the syntactic structure of sentence) [169] as well as a shallow semantic graph (built according to the shallow semantic relation structure of sentence) [170], respectively.Then, according to the method proposed by Airola et al. [171], the authors constructed all-path graph kernels to describe the connections between syntactic and semantic within the sentences.Finally, the graph kernels were used to train the least squares SVM classifier [171], which was applied to identify potential PK DDIs from the literatures.

Semantic predication-based model
Based on two widely used NLP tools: MetaMap [172] and SemRep [173], Zhang et al. [174] proposed a method to identify potential DDIs via semantic predications.Specifically, firstly, the authors extracted the drug list from clinical data and used MetaMap to map them to the concepts in UMLS [108].Secondly, from SemMedDB (a database composed of semantic predications generated by SemRep) [175], they extracted four types of semantic predication (namely, drug-predicate-biological function, genepredicate-biological function, gene-predicate-drug and drugpredicate-gene), where each semantic predication referred to a subject-predicate-object triplet with the UMLS concept as subject and object as well as semantic relationships from the UMLS semantic network as predicates.Thirdly, gene names were normalized to approved gene symbols based on Gene Nomenclature Committee dataset [176].Fourthly, according to two types of pathway schemas, all drug-drug pairs based on the combinations of semantic predications were collected.In the first schema, drug d i affects drug d j through acting on gene g k (i.e.d i → g k → d j ).In the second schema, d i affects g k1 , while d j affects g k2 , where both g k1 and g k2 regulate the same biological function (i.e.d i → g k1 → biological function ← g k2 ← d j ).Finally, the predicted potential DDIs were obtained by filtering out known DDIs from the collected drug-drug pairs.

Att-BLSTM
Zheng et al. [177] proposed a model named Att-BLSTM to extract DDIs from the biomedical literatures by combining attention mechanism and the recurrent neural network (RNN) with bidirectional long short-term memory (BLSTM).The network architecture of Att-BLSTM was made up of six components, namely, the input layer, embedding layer, input attention layer, merging layer, BLSTM layer and softmax layer.Besides, the DDI-2013 corpus [178] was used in this study, which consisted of the texts describing drugs, and the drug pair in each sentence were manually labeled as either noninteracting or interacting.Firstly, the DDI-2013 corpus was divided into a training set and a test set.For the sentence containing drugs d i and d j in the training set, three kinds of information [i.e. the word itself, part of speech (POS), relative distances between the word and each candidate drug in the sentence] of each word were extracted through the input layer, which were encoded into real-valued vectors (i.e.word embedding vectors, POS embedding vectors and position embedding vectors) by the embedding layer through looking up the corresponding embedding dictionary, respectively.Secondly, given that attention mechanisms could be used to quantify the effect of each word on the meaning of the sentence, the input attention layer was used to weigh the word embedding vectors.Thirdly, in the merging layer, each word was represented by a vector obtained by integrating the corresponding three types of embedding vectors.Then, the vectors representing the words were integrated into a sequence of vectors, which was input into the BLSTM layer to learn the global semantic representation of the sentence.Finally, the global representation of sentences was fed into the softmax layer to predict potential DDIs.

Position-aware multi-task deep learning method based on BLSTM
To automatically extract DDIs from biomedical texts, Zhou et al. [179] developed a position-aware multi-task deep learning method based on BLSTM (PM-BLSTM), whose architecture mainly consisted of four parts: embedding layer, BLSTM layer, positionaware attention layer and multi-task output layer.The DDI-2013 corpus was also used in this study, but unlike in Att-BLSTM where drug pairs were extracted directly from the sentences, for sentences involving more than two drugs, the authors used the rules proposed by Liu et al. [180] to filter the drugs to ensure that only one drug pair in each sentence was studied.For the sentence containing drugs d i and d j , through a word embedding dictionary and a position embedding dictionary, the embedding layer generated the corresponding embedding vector for each word.Then, the embedding vectors of all words were fed into the BLSTM layer to obtain the matrix composed of the hidden representation of each word.Considering that it was inaccurate to generate attentions only based on the local semantic information for a long sentence, the author utilized the position-aware attention layer to fuse the hidden representations of all words to get the sentence representation.Finally, the sentence representation was fed into the softmax-based classifiers in the multi-task output layer to identify potential DDIs.

A two-stage DDIs extraction model
Huang et al. [181] developed a two-stage method based on SVM and long short-term memory (LSTM) [182] to extract DDIs, where the SVM classifier was applied to identify potential DDIs, while the LSTM-based classifier was used for predicting the type of DDIs, including advise (two drugs are suggested to be taken together in the text), effect (the effects of two drugs taken together are described in the text), mechanism (the pharmacokinetic mechanism of DDI is introduced in the text) and int (there is no additional information about DDI in the text) according to the text description (Figure 8).Specifically, in the first stage, a feature definition approach [183] was used to extract the features for each sentence in the DDI-2013 corpus, including context word feature, pattern feature, verb feature, syntactic feature and auxiliary features.Then, the binary classifier SVM was used to classify the sentences into positive and negative instances.The drug pairs involved in the two instances were regarded as positive DDIs and negative DDIs, respectively.In the second stage, the authors used the GDEP [184] parser to get the stem, POS-tag, syntactic chunk and biomedical entity of positive instances, based on which the word representation model [185] was applied to obtain the word embedding, stem embedding, POS embedding, chunk embedding and entity embedding of each word.Finally, the above embeddings of all words in each positive instance were fed into the LSTMbased classifier to predict the DDI type.

Instance position embedding and key external text for DDI extraction
Dou et al. [186] developed a framework named instance position embedding and key external text for DDI extraction (IK-DDI), where the instance position embedding was applied to extract DDI information from the DDI Extraction 2013 [187] database, while the key external text describing drugs was derived from the DrugBank.Specifically, firstly, the sentence (containing drugs d i and d j ) recorded in DDI Extraction 2013 was input into the module (composed of the layers of Embedding, BiLSTM, CNN and MaxPooling) to obtain the feature vector f int ij of drug pair (d i ,d j ).Secondly, given that the same drug may have different names in different texts, for drug d i in the drug pair (d i ,d j ), the authors first calculated the word string similarity SR ik and word sense similarity SE ik between the string of d i and string of drug d k in DrugBank according to the method presented in the study [188].Then, the comprehensive similarity was defined: After calculating the comprehensive similarities between d i and all drugs in DrugBank, the drug with the highest comprehensive similarity with d i was regarded as the matched drug of d i .The matched drug of d j was obtained in a similar way.Next, the authors performed 'Search key external text' in DrugBank to mine two key sentences containing the matched drugs of d i and d j , respectively, which were input into the module consisting of the layers of Embedding, CNN and MaxPooling to get the feature vector f ext ij of drug pair (d i ,d j ).Then, f int ij and f ext ij were input into the fully connected layer to obtain the final feature vector f ij of the drug pair.Finally, based on f ij , the softmax classification function was applied to calculate the interaction probability between d i and d j .

3D graph and text-based neural network for drug-drug interaction prediction
By integrating the 3D GNN and pretrained text attention mechanism, Chen et al. [189] constructed a model named 3D graph and text-based neural network for drug-drug interaction prediction (3DGT-DDI).Firstly, through a force field optimization algorithm called MMFF [190], the authors obtained the 3D structure conformation of drug d i based on corresponding SMILES, which was fed into the 3D GNN to get the structure-based feature of d i .Then, the structure-based features of drug d i and d j were integrated as the feature vector of drug pair (d i ,d j ).Secondly, sciBERT, a variant of bidirectional encoder representations from transformers (BERT) pretrained on scientific articles, was applied to tokenize the text describing drug pair (d i ,d j ) recorded in DDI Extraction 2013.Then, the tokenized text was used as the input of CNN to obtain the text-based feature vector of the drug pair.Finally, the above two types of feature vectors of the drug pair were input into the DNN to obtain the interaction probability between d i and d j .

Score function-based models
Given the successful application of score function-based models in the field of bioinformatics [42][43][44]191], some researchers have developed models based on score functions to identify potential DDIs.The advantages of score function-based models are that the algorithm theory and calculation process involved are relatively easy to understand.Moreover, this type of model does not require negative samples.However, most of the score functionbased models make predictions based on known DDIs, so they are not applicable to new drugs.In addition, when using this kind of model to predict DDIs, it is usually necessary to make assumptions about the probability distribution of DDIs, but if the data are inconsistent with the assumptions, the prediction accuracy of the models would be severely affected.

Russell-Rao-based model
To predict potential DDIs, Ferdousi et al. [192] proposed a computational model based on the Russell-Rao method [193].To be specific, according to the known associations between drugs and four types of biological elements (including 23 carriers, 115 transporters, 235 enzymes and 1787 targets), the authors constructed four corresponding binary vectors for each drug.Given that there were shared proteins between these four types of biological elements, the authors spliced four types of binary vectors to construct the comprehensive binary vector (2004 dimension) for each drug after removing the redundant proteins.Then, the method of Russell-Rao [193] was used to calculate the interaction probability P(d i ,d j ) between drugs d i and d j : where V d i and V d j represent the comprehensive binary vectors of drug d i and d j , respectively.

Score matrix and PCA-based model
Vilar et al. [194] proposed a DDIs prediction model by constructing score matrixes according to adjacency matrix and similarity matrixes (Figure 9).Firstly, based on the drug-related information (including 2D structural fingerprints [195], interaction profile fingerprints [196], target profile fingerprints [55] and adverse drug effects (ADEs) profile fingerprints [197]) downloaded from DrugBank, the authors constructed the corresponding binary vector for each drug and then calculated the similarities between drugs using the Jaccard index, respectively.Besides, the authors calculated the 3D structure-based similarities between drugs by the Phase package.Then, five corresponding similarity matrixes were constructed, which were represented by M i 1 i = 1, 2, 3, 4, 5 .Secondly, they defined the original score matrix M i 2 as follows: where n refers to the total number of drugs and A represents the adjacency matrix constructed based on known DDIs.Since the matrix M i 2 is not symmetric, the authors performed symmetric transformation on the matrixM i 2 to obtain the final score matrix M i 3 : Finally, the PCA method was used to integrate the score matrixes (i.e.M 1 3 , M 2 3 , M 3 3 , M 4 3 , M 5 3 ) to obtain the interaction probability matrix.

DISCUSSION AND CONCLUSION
In clinical treatment, in order to cure the disease as soon as possible, patients usually take two or more drugs.The combination of some drugs could not only increase the efficacy of drugs but also delay the emergence of resistance [198].However, inappropriate drug combinations not only fail to achieve the expected therapeutic effect but also may cause adverse reactions, even toxic reactions.Many drugs are forced to stop selling due to serious adverse reactions caused by DDIs, which not only brings harm to patients but also brings huge economic losses to pharmaceutical companies.More than 20 years ago, the calcium channel blocker mibefradil was withdrawn from the market because it could lead to lethal DDIs by inhibiting the cytochrome P450 3A4 metabolism of certain drugs [199].For this reason, manufacturers are required to specify DDIs strictly in drug instructions, and consumers must carefully read the instructions when using drugs.As more and more new drugs are approved for clinical treatment, the number of potential DDIs increases rapidly.However, due to the time and money constraints, a large number of potential DDIs that may cause adverse reactions are not provided in drug instructions.With the deepening of the understanding of drug metabolism mechanisms and the rapid accumulation of drug-related data, more and more researchers were committed to building computational models to predict potential DDIs.
In this review, we first introduced the basic conception and classification of DDIs.Some important publicly available databases and web servers about experimentally verified or predicted DDIs were also brief ly described.Besides, we summarized three types of prediction models proposed during recent years and discussed the advantages as well as limitations of them.Finally, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.In general, this review is helpful for researchers to have a comprehensive understanding of DDIs prediction and provides valuable guidance for their research studies, especially in the construction of models; they can weigh the advantages and disadvantages of various models to build the most suitable model for their own research studies.pair according to the information about the chemical structure and target protein of drugs.Then, the feature vector was fed into the random forest model to calculate the interaction probability between the corresponding drugs.
In the deep learning-based model, researchers applied different techniques (i.e.CNNs, DNNs, GNNs, Graph Embedding and so on) to predict DDIs based on drug-related information or used relevant methods of NLP to mine potential DDIs from texts.Taking MCFF-MTDDI [146] as an example, after obtaining the chemical structure-based feature and KG fusion representations of each drug pairs, the two types of features were input into a GRU-based multichannel feature fusion framework to obtain the fused feature vector, which was used as the input of the multi-class classifier to implement the multi-class classification tasks.Besides, the extra label-based feature vector and fused feature vector of drug pair were concatenated and input into the multi-label classifier to implement the multi-label classification.In the model constructed by Huang et al. [181] based on the two-stage method, after defining the features of each sentence, the authors first divided the sentences into positive instances and negative instances using SVM.Then, they used the word representation model to obtain multiple embedding of each word in the positive instances, which was fed into the LSTM-based classifier to predict the type of the corresponding DDIs.
In the score function-based models, the authors defined score functions from different perspectives to calculate the interaction probabilities between drugs.For example, in the model constructed based on the Russell-Rao method, according to four kinds of association information about drugs, the authors constructed corresponding binary vectors for each drug, which were integrated into a comprehensive binary vector.Finally, the Russell-Rao method was used to calculate the interaction probability based on the comprehensive binary vectors of two drugs.
Next, we summarized the advantages and disadvantages of the above three types of models.The traditional machine learningbased model can be used to make large-scale rapid prediction for potential DDIs.Besides, the main advantage of this type of model is that they are suitable for new drugs, such as TMFUF [99], which can be used to predict DDIs between new drugs.However, they still have some limitations.For example, the traditional machine learning-based models involve multiple parameters, and the setting of parameter values limits the model performance to some extent.Besides, researchers tended to define the feature vectors of drug pairs based on the similarities between drug, so constructing feature vectors with higher significance is still an urgent problem to be solved.Moreover, given that negative samples were extremely difficult to obtain, researchers treated candidate samples as negative samples to train this type of models, which limited the performance of the model to some extent.Compared with traditional machine learning methods, deep learning methods can automatically mine the significant features of drugs.In addition, deep learning-based models have high f lexibility in feature fusion.For instance, in MDF-SA-DDI [159], the self-attention mechanism was used to fuse the feature vectors of drug pairs, while a GRU-based multichannel feature fusion framework was applied to integrate the feature vectors of drug pairs in MCFF-MTDDI [146].Compared with the simple splicing of feature vectors, the above fusion methods can fully fuse features.However, implementing predictions with deep learning-based models often takes more time.As with models constructed based on the traditional machine learning algorithms, the scarcity of reliable negative samples severely limits the performance of models.Besides, deep learning-based models lack interpretability.The advantage of score function-based models is that the algorithm theory and calculation process involved are relatively easy to understand.Moreover, this type of model does not require negative samples.However, they still have some deficiencies.For instance, most of the score function-based models are not suitable for new drugs.In addition, when these models are applied to predict potential DDIs, assumptions about the probability distribution are often required, but if the data are inconsistent with the assumptions, the prediction accuracy of the models will be severely affected.
Considering the advantages and disadvantages of each type of models, in my opinion, it is best for researchers to build models based on deep learning algorithms in the future research.Compared with the other two types of models, the computational efficiency of deep learning-based models is lower, but their accuracy is generally higher.Besides, deep learning-based models can predict interactions between drug substructures, which is beneficial for us to understand the mechanism of DDIs.NLP models could not only be used to mine a large number of DDIs from the literatures but also help researchers learn detailed information about DDIs.As for the disadvantages of this type of model, instead of randomly selecting drug pairs without known interactions as negative samples, designing methods to identify reliable negative samples is more conducive to further improving the accuracy of the models.Given the lack of interpretability in most deep learning-based models, it is necessary to perform interpretability analysis on these models.In addition, in view of the fact that the values of parameters in models affect the prediction performance to a certain extent, designing the model to determine the optimal values of the parameters helps further improve the prediction accuracy.
To evaluate the predictive performance of computational models, most researchers conducted k-fold cross validation and case studies.Besides, there are other methods used to evaluate performance.For example, in MCFF-MTDDI [146], Chen et al. applied seven classical evaluation indicators (including Accuracy, Macro-Precision, Macro-Recall, Macro-F1, Cohen's Kappa, AUC and AUPR) to evaluate the prediction performance of model in a multi-class classification task, while AUC and AUPR were used to evaluate the performance of model in multi-label prediction tasks.In BioDKG-DDI [154], the Matthews's correlation coefficient (MCC) was used to evaluate the effectiveness of the model.These indicators are very convincing in assessing the performance of DDIs prediction models.Besides, some researchers performed ablation studies to assess the effect of each module on predictive performance.For example, in SSI-DDI [129], the authors removed the co-attention layer and changed the number of GAT layers to evaluate the effect of the corresponding two modules on the prediction accuracy, respectively.
In view of that discovering potential DDIs would be beneficial to drug development and clinical treatment, researchers have developed several DDI prediction models with superior performance.However, there are still some problems to be solved in the future.Firstly, the data are extremely unbalanced, i.e. the number of positive samples is much smaller than the number of candidate samples.Therefore, it is necessary to collect more drug-drug pairs with known interactions as positive samples in the future work.Secondly, current research studies focus on the identification of potential DDIs and the prediction of DDIs types, but less research has been done on the severity of DDIs.Besides, the existing prediction models do not take into account the effect of drug dose on DDIs.Thirdly, current models can only be used to predict interactions between two drugs, which is far from enough.In clinical treatment, patients often need to take more than two drugs.Therefore, it is of great significance to study the interactions among multiple drugs.In order to solve the above two problems, it is necessary to build the corresponding databases to provide the data foundation for the subsequent research studies.Fourthly, as more and more drug-related data are generated, applying deep learning algorithms to effectively merge them is expected to further improve the prediction accuracy.Finally, many models made predictions only based on the similarity and association information of drugs, but these models could not reveal the mechanism of interactions.If text information describing the drugs is introduced, this problem is expected to be improved.

Key Points
• We introduced the basic conception and classification of DDIs.In addition, several databases and web servers about DDIs were introduced.• Paying attention to DDIs is of great significance to adopt effective combination therapy and further improve the quality of medical treatment.• Revealing DDIs through experiments is extremely timeconsuming and costly, and building the computational models to calculate the interaction probabilities between drugs could be an important complement to experimental methods.• Based on the calculation principle of the models, we simply divided the models into three categories: traditional machine learning-based models, deep learningbased models and score function-based models.• We brief ly discussed the advantages and limitations of existing computational models and put forward the existing problems in the current DDIs prediction research, which need to be resolved in the future work.

Figure 1 .
Figure 1.The f lowchart of Bayesian probabilistic method-based model, where the interaction probabilities between drugs are calculated by integrating the system connection score and phenotypic similarity score through a Bayesian probabilistic method.

Figure 2 .
Figure 2. The f lowchart of the meta-learning-based model built based on representation learning, PUL and meta-learning.

Figure 3 .
Figure 3.The f lowchart of the gradient boosting-based model, where XGBoost classifier is applied to predict potential DDIs based on multiple features of drug pairs.

Figure 4 .
Figure 4.The f lowchart of SSI-DDI, which is applied to predict potential DDIs based on the interactions between drug substructures.

Figure 5 .
Figure 5.The f lowchart of MCFF-MTDDI, which consists of three modules: feature extraction module, feature fusion module and classifier module.

Figure 6 .
Figure 6.The f lowchart of BioDKG-DDI, where the self-attention mechanism is used to fuse the features of two drugs to obtain the features of the corresponding drug pair, which is input into the DNN to obtain the corresponding interaction probability.

Figure 7 .
Figure 7.The f lowchart of MDF-SA-DDI constructed based on multisource feature fusion and self-attention mechanism.

Figure 8 .
Figure 8.The f lowchart of the two-stage DDIs extraction model built based on SVM and LSTM.

Figure 9 .
Figure 9.The f lowchart of the score matrix and PCA-based model, where the score matrixes are calculated separately based on different similarities, and PCA is used to integrate all the score matrices into the final interaction probability matrix.

Table 1 :
The function and URL of databases as well as web servers drugs and providing more than 200 data fields for each drug, with half of the information devoted to chemical, pharmacological, pharmaceutical and other aspects of the drug and the other half dedicated to documenting the sequence, structure and pathway of the drug target.php)

Table 2 :
The significance and related link of computational models [57]lSignificanceLink to the GitHub or sitesBayesian probabilistic method-based model[13]Introducing the system connection score and drug phenotypic similarity score http://www.picb.ac.cn/hanlab/DDIINDI[57]Applying a novel scoring scheme to construct the feature vectors of drug pairs based on multiple types of drug similarity